A Load-Aware Data Placement Policy on Cluster File System
نویسندگان
چکیده
In a large-scale cluster system with many applications running on it, cluster-wide I/O access workload disparity and disk saturation on only some storage servers have been the severe performance bottleneck that deteriorates the system I/O performance. As a result, the system response time will increase and the throughput of the system will decrease drastically. In this paper, we present a load-aware data placement policy that will distribute data across the storage servers based on the load of each server and automatically migrate data from heavily-loaded servers to lightly-loaded servers. This policy is adaptive and self-managing. It operates without any prior knowledge of application access workload characteristics or the capabilities of storage servers. It can make full use of the aggregate disk bandwidth of all storage servers efficiently. Performance evaluation shows that our policy will improve the aggregate I/O bandwidth by 10%-20% compared with random data placement policy especially under mixed workloads.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملAn Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management
Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large datasets across cluster of workstations. To handle massive scale data, Hadoop exploits the Hadoop Distributed File System termed as HDFS. The HDFS similar to most distributed file systems sh...
متن کاملPerformance Improvement of Map Reduce through Enhancement in Hadoop Block Placement Algorithm
In last few years, a huge volume of data has been produced from multiple sources across the globe. Dealing with such a huge volume of data has arisen the so called “Big data problem”, which can be solved only with new computing paradigms and platforms which lead to Apache Hadoop to come into picture. Inspired by the Google’s private cluster platform, few independent software developers develope...
متن کاملSorrento: A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications
This paper describes the design and implementation of Sorrento – a self-organizing storage cluster built upon commodity components. Sorrento complements previous researches on distributed file/storage systems by focusing on incremental expandability and manageability of the system and on design choices for optimizing performance of parallel data-intensive applications with low write-sharing pat...
متن کاملA Cyber-Physical, Data-Centric Cooling Energy Costs Reduction Approach for Big Data Analytics Cloud
Big Data explosion and surge in large-scale Big Data analytics cloud infrastructure have led to burgeoning energy costs and present a challenge to the existing run-time cooling energy management techniques. T ∗GreenHDFS, a thermalaware cloud file system, takes a novel, data-centric approach to reduce cooling energy costs. On the physicalside, T ∗GreenHDFS is cognizant of the uneven thermalprofi...
متن کامل